AITopics

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)
Education > Curriculum > Subject-Specific Education (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Neural Information Processing SystemsFeb-11-2026, 00:26:40 GMT

34cc2ded6daba59357134c0b9fb06bfe-Paper-Datasets_and_Benchmarks_Track.pdf

buggy program, large language model, machine learning, (18 more...)

Country: Asia > Singapore (0.04)

Genre:

Research Report (0.68)
Workflow (0.49)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.48)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Shahriar, Sadat, Ayoobi, Navid, Mukherjee, Arjun

The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?

arXiv.org Artificial IntelligenceDec-8-2025

With the increasing reliance on LLMs as research agents, distinguishing between LLM and human-generated ideas has become crucial for understanding the cognitive nuances of LLMs' research capabilities. While detecting LLM-generated text has been extensively studied, distinguishing human vs LLM-generated scientific idea remains an unexplored area. In this work, we systematically evaluate the ability of state-of-the-art (SOTA) machine learning models to differentiate between human and LLM-generated ideas, particularly after successive paraphrasing stages. Our findings highlight the challenges SOTA models face in source attribution, with detection performance declining by an average of 25.4\% after five consecutive paraphrasing stages. Additionally, we demonstrate that incorporating the research problem as contextual information improves detection performance by up to 2.97%. Notably, our analysis reveals that detection algorithms struggle significantly when ideas are paraphrased into a simplified, non-expert style, contributing the most to the erosion of distinguishable LLM signatures.

artificial intelligence, large language model, natural language, (14 more...)

2512.05311

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceNov-18-2025

Examining the Usage of Generative AI Models in Student Learning Activities for Software Programming

Chen, Rufeng, Jiang, Shuaishuai, Shen, Jiyun, Moon, AJung, Wei, Lili

Abstract--The rise of Generative AI (GenAI) tools like Chat-GPT has created new opportunities and challenges for computing education. Existing research has primarily focused on GenAI's ability to complete educational tasks and its impact on student performance, often overlooking its effects on knowledge gains. In this study, we investigate how GenAI assistance compares to conventional online resources in supporting knowledge gains across different proficiency levels. We conducted a controlled user experiment with 24 undergraduate students of two different levels of programming experience (beginner, intermediate) to examine how students interact with ChatGPT while solving programming tasks. We analyzed task performance, conceptual understanding, and interaction behaviors. Our findings reveal that generating complete solutions with GenAI significantly improves task performance, especially for beginners, but does not consistently result in knowledge gains. Importantly, usage strategies differ by experience: beginners tend to rely heavily on GenAI toward task completion often without knowledge gain in the process, while intermediates adopt more selective approaches. We find that both over-reliance and minimal use result in weaker knowledge gains overall. Based on our results, we call on students and educators to adopt GenAI as a learning rather than a problem solving tool. Our study highlights the urgent need for guidance when integrating GenAI into programming education to foster deeper understanding. The rapid development of Generative Artificial Intelligence (GenAI) has led to its widespread adoption across various domains to boost productivity and streamline workflows. Large Language Models (LLMs), such as OpenAI's ChatGPT and Codex, Google Gemini, and GitHub Copilot, have been integrated into domains including software engineering [1], [2], healthcare [3], education [4], creative writing [5], [6], and digital music [7], offering capabilities such as code generation, question answering, and image generation. These authors contributed equally to this work. Some studies evaluated GenAI's performance on programming tasks [8], user interface design education [9], and computer vision coursework [10]. Others focused on assessing the accuracy and usability of GenAIgenerated responses [11], [12].

large language model, machine learning, natural language, (20 more...)

2511.13271

Country:

North America > United States (0.30)
North America > Canada > Quebec > Montreal (0.15)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.68)
Education > Educational Setting (0.48)
Education > Educational Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Javahar, Jeena, Budhrani, Tanya, Basha, Manaal, de Souza, Cleidson R. B., Beschastnikh, Ivan, Rodriguez-Perez, Gema

Cracking CodeWhisperer: Analyzing Developers' Interactions and Patterns During Programming Tasks

arXiv.org Artificial IntelligenceOct-14-2025

Abstract--The use of AI code-generation tools is becoming increasingly common, making it important to understand how software developers are adopting these tools. In this study, we investigate how developers engage with Amazon's Code-Whisperer, an LLM-based code-generation tool. We conducted two user studies with two groups of 10 participants each, interacting with CodeWhisperer - the first to understand which interactions were critical to capture and the second to collect low-level interaction data using a custom telemetry plugin. Our mixed-methods analysis identified four behavioral patterns: 1) incremental code refinement, 2) explicit instruction using natural language comments, 3) baseline structuring with model suggestions, and 4) integrative use with external sources. We provide a comprehensive analysis of these patterns . Several IDE-based code generation tools have been released in the past few years, such as GitHub's Copilot [8], Kite [14], Amazon's Code Whisperer [20], Tabnine [22], and WPCode [28]. Research reveals that being able to achieve their full potential requires a certain level of guidance to ensure that the tool's output aligns with the user's goal [21].

codewhisperer, large language model, natural language, (17 more...)

2510.11516

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.68)
Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Neural Information Processing SystemsOct-9-2025, 23:02:44 GMT

34cc2ded6daba59357134c0b9fb06bfe-Supplemental-Datasets_and_Benchmarks_Track.pdf

buggy program, dataset, learner, (13 more...)

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)
Government (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications (0.94)
Information Technology > Software (0.93)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 23:02:40 GMT

Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation

buggy program, inference time, learner, (13 more...)

Country: Asia > Singapore (0.04)

Genre:

Research Report (0.68)
Workflow (0.49)

Industry:

Information Technology > Security & Privacy (0.48)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Educational Setting (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

arXiv.org Artificial IntelligenceOct-8-2025

Exploring Student Choice and the Use of Multimodal Generative AI in Programming Learning

Hou, Xinying, Xiao, Ruiwei, Ye, Runlong, Liut, Michael, Stamper, John

The broad adoption of Generative AI (GenAI) is impacting Computer Science education, and recent studies found its benefits and potential concerns when students use it for programming learning. However, most existing explorations focus on GenAI tools that primarily support text-to-text interaction. With recent developments, GenAI applications have begun supporting multiple modes of communication, known as multimodality. In this work, we explored how undergraduate programming novices choose and work with multimodal GenAI tools, and their criteria for choices. We selected a commercially available multimodal GenAI platform for interaction, as it supports multiple input and output modalities, including text, audio, image upload, and real-time screen-sharing. Through 16 think-aloud sessions that combined participant observation with follow-up semi-structured interviews, we investigated student modality choices for GenAI tools when completing programming problems and the underlying criteria for modality selections. With multimodal communication emerging as the future of AI in education, this work aims to spark continued exploration on understanding student interaction with multimodal GenAI in the context of CS education.

large language model, machine learning, natural language, (18 more...)

2510.05417

Country:

North America > United States > Michigan (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Technology (1.00)
Education > Educational Setting > Online (0.46)
Education > Curriculum > Subject-Specific Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Lee, Dongjun, Hwang, Changho, Lee, Kimin

Learning to Generate Unit Test via Adversarial Reinforcement Learning

arXiv.org Artificial IntelligenceOct-1-2025

Unit testing is a core practice in programming, enabling systematic evaluation of programs produced by human developers or large language models (LLMs). Given the challenges in writing comprehensive unit tests, LLMs have been employed to automate test generation, yet methods for training LLMs to produce high-quality tests remain underexplored. In this work, we propose UTRL, a novel reinforcement learning framework that trains an LLM to generate high-quality unit tests given a programming instruction. Our key idea is to iteratively train two LLMs, the unit test generator and the code generator, in an adversarial manner via reinforcement learning. The unit test generator is trained to maximize a discrimination reward, which reflects its ability to produce tests that expose faults in the code generator's solutions, and the code generator is trained to maximize a code reward, which reflects its ability to produce solutions that pass the unit tests generated by the test generator. In our experiments, we demonstrate that unit tests generated by Qwen3-4B trained via UTRL show higher quality compared to unit tests generated by the same model trained via supervised fine-tuning on human-written ground-truth unit tests, yielding code evaluations that more closely align with those induced by the ground-truth tests. Moreover, Qwen3-4B trained with UTRL outperforms frontier models such as GPT-4.1 in generating high-quality unit tests, highlighting the effectiveness of UTRL in training LLMs for this task.

large language model, machine learning, natural language, (19 more...)

2508.21107

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-19-2025

SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems

Yao, Xifeng, Lang, Dongyu, Zhang, Wu, Guo, Xintong, Xie, Huarui, Ni, Yinhao, Liu, Ping, Shen, Guang, Bai, Yi, Tu, Dandan, Zhang, Changzheng

Significant advancements have been made in the capabilities of code large language models, leading to their rapid adoption and application across a wide range of domains. However, their further advancements are often constrained by the scarcity of real-world coding problems. To bridge this gap, we propose a novel framework for synthesizing code problems that emulate authentic real-world scenarios. This framework systematically integrates domain knowledge, domain skills, and coding skills, all of which are meticulously extracted from real-world programming-related datasets, including Stack Overflow and Kaggle. The extracted elements serve as the foundational building blocks for constructing code problems. To align the generated problems with practical applications, application scenarios are also mined from the aforementioned datasets. These scenarios are then utilized to construct a scenario-centric graph that interconnects domain knowledge, domain skills, and coding skills. Based on this structured representation, a sampling strategy on the graph is designed, which effectively controls the generation of a code problem with complexity and diversity, reflects real-world challenges. Experimental results demonstrate that the proposed method consistently achieves superior performance over state-of-the-art open-source large language models of varying sizes and functionalities, including both coders and general-purpose models, across a diverse set of real-world benchmarks.

large language model, machine learning, natural language, (21 more...)

2509.14281

Country: Asia (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)